AITopics

Country:

Asia > China > Zhejiang Province > Hangzhou (0.05)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Transportation > Ground > Road (1.00)
Transportation > Infrastructure & Services (0.84)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(3 more...)

Neural Information Processing SystemsFeb-10-2026, 22:06:19 GMT

96ca792fddef7c1e3366c405022463cb-Paper-Conference.pdf

evaluation, mdp, point mdp, (12 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
Asia > Middle East > Jordan (0.04)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Ingolstadt (0.04)

Genre:

Research Report (0.68)
Instructional Material (0.46)

Industry:

Health & Medicine (0.68)
Transportation > Infrastructure & Services (0.50)
Transportation > Ground > Road (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Neural Information Processing SystemsFeb-7-2026, 22:12:59 GMT

29e48b79ae6fc68e9b6480b677453586-Paper.pdf

algorithm, attendlight, intersection, (12 more...)

Country:

North America > United States > North Carolina > Wake County > Cary (0.05)
Asia > China > Zhejiang Province > Hangzhou (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Industry:

Health & Medicine (0.68)
Transportation > Infrastructure & Services (0.50)
Transportation > Ground > Road (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsDec-24-2025, 20:28:11 GMT

The Impact of Task Underspecification in Evaluating Deep Reinforcement Learning

Evaluations of Deep Reinforcement Learning (DRL) methods are an integral part of scientific progress of the field. Beyond designing DRL methods for general intelligence, designing task-specific methods is becoming increasingly prominent for real-world applications. In these settings, the standard evaluation practice involves using a few instances of Markov Decision Processes (MDPs) to represent the task. However, many tasks induce a large family of MDPs owing to variations in the underlying environment, particularly in real-world contexts. For example, in traffic signal control, variations may stem from intersection geometries and traffic flow levels. The select MDP instances may thus inadvertently cause overfitting, lacking the statistical power to draw conclusions about the method's true performance across the family. In this article, we augment DRL evaluations to consider parameterized families of MDPs. We show that in comparison to evaluating DRL methods on select MDP instances, evaluating the MDP family often yields a substantially different relative ranking of methods, casting doubt on what methods should be considered state-of-the-art.

deep reinforcement learning, name change, task underspecification, (9 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Neural Information Processing SystemsDec-23-2025, 21:37:51 GMT

AttendLight: Universal Attention-Based Reinforcement Learning Model for Traffic Signal Control

We propose AttendLight, an end-to-end Reinforcement Learning (RL) algorithm for the problem of traffic signal control. Previous approaches for this problem have the shortcoming that they require training for each new intersection with a different structure or traffic flow distribution. AttendLight solves this issue by training a single, universal model for intersections with any number of roads, lanes, phases (possible signals), and traffic flow. To this end, we propose a deep RL model which incorporates two attention models. The first attention model is introduced to handle different numbers of roads-lanes; and the second attention model is intended for enabling decision-making with any number of phases in an intersection. As a result, our proposed model works for any intersection configuration, as long as a similar configuration is represented in the training set. Experiments were conducted with both synthetic and real-world standard benchmark datasets.

attendlight, intersection, universal attention-based reinforcement learning model, (9 more...)

Industry:

Transportation > Infrastructure & Services (0.66)
Transportation > Ground > Road (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.63)

arXiv.org Artificial IntelligenceDec-12-2025

VLMLight: Safety-Critical Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning Architecture

Wang, Maonan, Chen, Yirong, Pang, Aoyu, Cai, Yuxin, Chen, Chung Shue, Kan, Yuheng, Pun, Man-On

Traffic signal control (TSC) is a core challenge in urban mobility, where real-time decisions must balance efficiency and safety. Existing methods - ranging from rule-based heuristics to reinforcement learning (RL) - often struggle to generalize to complex, dynamic, and safety-critical scenarios. We introduce VLMLight, a novel TSC framework that integrates vision-language meta-control with dual-branch reasoning. At the core of VLMLight is the first image-based traffic simulator that enables multi-view visual perception at intersections, allowing policies to reason over rich cues such as vehicle type, motion, and spatial density. A large language model (LLM) serves as a safety-prioritized meta-controller, selecting between a fast RL policy for routine traffic and a structured reasoning branch for critical cases. In the latter, multiple LLM agents collaborate to assess traffic phases, prioritize emergency vehicles, and verify rule compliance. Experiments show that VLMLight reduces waiting times for emergency vehicles by up to 65% over RL-only systems, while preserving real-time performance in standard conditions with less than 1% degradation. VLMLight offers a scalable, interpretable, and safety-aware solution for next-generation traffic signal control.

large language model, machine learning, vlmlight, (18 more...)

2505.19486

Country: Asia > China (0.47)

Genre: Research Report > New Finding (0.67)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceDec-11-2025

MAESTRO: Multi-Agent Environment Shaping through Task and Reward Optimization

Wu, Boyuan

Cooperative Multi-Agent Reinforcement Learning (MARL) faces two major design bottlenecks: crafting dense reward functions and constructing curricula that avoid local optima in high-dimensional, non-stationary environments. Existing approaches rely on fixed heuristics or use Large Language Models (LLMs) directly in the control loop, which is costly and unsuitable for real-time systems. We propose MAESTRO (Multi-Agent Environment Shaping through Task and Reward Optimization), a framework that moves the LLM outside the execution loop and uses it as an offline training architect. MAESTRO introduces two generative components: (i) a semantic curriculum generator that creates diverse, performance-driven traffic scenarios, and (ii) an automated reward synthesizer that produces executable Python reward functions adapted to evolving curriculum difficulty. These components guide a standard MARL backbone (MADDPG) without increasing inference cost at deployment. We evaluate MAESTRO on large-scale traffic signal control (Hangzhou, 16 intersections) and conduct controlled ablations. Results show that combining LLM-generated curricula with LLM-generated reward shaping yields improved performance and stability. Across four seeds, the full system achieves +4.0% higher mean return (163.26 vs. 156.93) and 2.2% better risk-adjusted performance (Sharpe 1.53 vs. 0.70) over a strong curriculum baseline. These findings highlight LLMs as effective high-level designers for cooperative MARL training.

large language model, machine learning, natural language, (15 more...)

2511.19253

Country:

North America > Canada (0.28)
Asia > China > Zhejiang Province > Hangzhou (0.24)

Genre: Research Report > New Finding (1.00)

Industry:

Education (0.93)
Transportation > Ground > Road (0.49)
Transportation > Infrastructure & Services (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Önür, Giray, Dabiri, Azita, De Schutter, Bart

Adaptive Tuning of Parameterized Traffic Controllers via Multi-Agent Reinforcement Learning

arXiv.org Artificial IntelligenceDec-9-2025

Effective traffic control is essential for mitigating congestion in transportation networks. Conventional traffic management strategies, including route guidance, ramp metering, and traffic signal control, often rely on state feedback controllers, used for their simplicity and reactivity; however, they lack the adaptability required to cope with complex and time-varying traffic dynamics. This paper proposes a multi-agent reinforcement learning framework in which each agent adaptively tunes the parameters of a state feedback traffic controller, combining the reactivity of state feedback controllers with the adaptability of reinforcement learning. By tuning parameters at a lower frequency rather than directly determining control actions at a high frequency, the reinforcement learning agents achieve improved training efficiency while maintaining adaptability to varying traffic conditions. The multi-agent structure further enhances system robustness, as local controllers can operate independently in the event of partial failures. The proposed framework is evaluated on a simulated multi-class transportation network under varying traffic conditions. Results show that the proposed multi-agent framework outperforms the no control and fixed-parameter state feedback control cases, while performing on par with the single-agent RL-based adaptive state feedback control, with a much better resilience to partial failures.

controller, machine learning, reinforcement learning, (17 more...)

2512.07417

Country: Europe > Netherlands (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Yazdani, Pouria, Rezaali, Arash, Abdoos, Monireh

Semi Centralized Training Decentralized Execution Architecture for Multi Agent Deep Reinforcement Learning in Traffic Signal Control

arXiv.org Artificial IntelligenceDec-5-2025

Traffic congestion is a major and complex challenge for cities worldwide with the rapid growth of urbanization and vehicle ownership. Longer commute times, excessive fuel consumption, and elevated air pollution levels are direct consequences of over-saturated roads. For instance, according to the 2024 INRIX Global Traffic Scorecard, individual commuters in Istanbul, New York City, and Chicago experienced total annual delay of about 105, 102, and 102 hours, respectively, underscoring the magnitude of intersection-driven delays in major metros (INRIX). Within urban networks, signalized intersections are the dominant bottlenecks: the policies implemented at these intersections allocate scarce space-time among competing traffic streams and therefore largely determine corridor-level delay, queues, and emissions. Reinforcement learning (RL) has become a standard practice for adaptive traffic signal control (ATSC), controlling phase selection and timing as a sequential decision problem that optimizes long-horizon objectives such as delay, throughput, and emissions under nonstationary demand (Yau et al., 2017). Deep RL (DRL) extends this by using function approximation to digest rich state representations--from detector queues to trajectories and graph-structured networks--enabling policies that generalize across varying traffic flows and topologies (Zhao et al., 2024). Collectively, this body of work motivates moving beyond single-intersection controllers toward coordinated, network-level solutions and setting the stage for multi-agent formulations.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

2512.04653

Country:

North America > United States > New York (0.24)
North America > United States > Illinois > Cook County > Chicago (0.24)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.24)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.24)

Genre: Research Report (0.81)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.34)